Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data
نویسندگان
چکیده
We compare the disclosure risk criterion of ε-differential privacy with a criterion based on probabilities that intruders uncover actual values given the released data. To do so, we generate fully synthetic data that satisfy ε-differential privacy at different levels of ε, make assumptions about the information available to intruders, and compute posterior probabilities of uncovering true values. The simulation results suggest that the two paradigms are not easily reconciled, since differential privacy is agnostic to the specific values in the observed data whereas probabilistic disclosure risk measures depend greatly on them. The results also suggest, perhaps surprisingly, that probabilistic disclosure risk measures can be small even when ε is large. Motivated by these findings, we present an alternative disclosure risk assessment approach that integrates some of the strong confidentiality protection features in ε-differential privacy with the interpretability and data-specific nature of probabilistic disclosure risk measures.
منابع مشابه
How Protective Are Synthetic Data?
This short paper provides a synthesis of the statistical disclosure limitation and computer science data privacy approaches to measuring the confidentiality protections provided by fully synthetic data. Since all elements of the data records in the release file derived from fully synthetic data are sampled from an appropriate probability distribution, they do not represent “real data,” but ther...
متن کاملA New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data
Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a p...
متن کاملPeGS: Perturbed Gibbs Samplers that Generate Privacy-Compliant Synthetic Data
This paper proposes a categorical data synthesizer algorithm that guarantees a quantifiable disclosure risk. Our algorithm, named Perturbed Gibbs Sampler (PeGS), can handle highdimensional categorical data that are intractable if represented as contingency tables. PeGS involves three intuitive steps: 1) disintegration, 2) noise injection, and 3) synthesis. We first disintegrate the original dat...
متن کاملPrivacy and Statistical Risk: Formalisms and Minimax Bounds
We explore and compare a variety of definitions for privacy and disclosure limitation in statistical estimation and data analysis, including (approximate) differential privacy, testingbased definitions of privacy, and posterior guarantees on disclosure risk. We give equivalence results between the definitions, shedding light on the relationships between different formalisms for privacy. We also...
متن کاملTransparency and Disclosure Risk in Data Privacy
k-Anonymity and differential privacy can be considered examples of Boolean definitions of disclosure risk. In contrast, record linkage and uniqueness are examples of quantitative measures of risk. Record linkage is a powerful approach because it can model different types of scenarios in which an adversary attacks a protected database with some information and background knowledge. Transparency ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Trans. Data Privacy
دوره 5 شماره
صفحات -
تاریخ انتشار 2012